Skip to content

feat(skills): support loading skills from GitHub/Git URLs#2091

Open
dgallitelli wants to merge 6 commits intostrands-agents:mainfrom
dgallitelli:feat/skills-github-url-loading
Open

feat(skills): support loading skills from GitHub/Git URLs#2091
dgallitelli wants to merge 6 commits intostrands-agents:mainfrom
dgallitelli:feat/skills-github-url-loading

Conversation

@dgallitelli
Copy link
Copy Markdown
Contributor

@dgallitelli dgallitelli commented Apr 8, 2026

Summary

Closes #2090

  • Add support for remote Git repository URLs as skill sources in AgentSkills and Skill.from_url()
  • Support https://, git@, and ssh:// URLs with optional @ref suffix for branch/tag pinning (e.g., https://github.com/org/skill@v1.0.0)
  • Repositories are shallow-cloned (--depth 1) and cached locally at ~/.cache/strands/skills/ (configurable via cache_dir parameter)

Usage

from strands import Agent, AgentSkills

# Load skill directly from a GitHub repo
plugin = AgentSkills(skills=["https://github.com/dgallitelli/aws-data-agent-skill"])

# With version pinning
plugin = AgentSkills(skills=["https://github.com/org/skill-repo@v1.0.0"])

# Mixed with local skills
plugin = AgentSkills(skills=[
    "https://github.com/org/skill-repo",
    "./local-skills/my-skill",
])

# Or use the Skill classmethod directly
from strands import Skill
skills = Skill.from_url("https://github.com/org/skill-repo@main")

Changes

File Change
_url_loader.py (new) URL parsing, git clone with shallow depth, hash-based caching
skill.py Added Skill.from_url() classmethod
agent_skills.py Updated _resolve_skills() to detect URL strings; added cache_dir parameter
test_url_loader.py (new) 22 tests for URL detection, parsing, cache keys, and clone behavior
test_skill.py 6 tests for Skill.from_url()
test_agent_skills.py 4 tests for URL resolution in AgentSkills

Design decisions

  • Git clone via subprocess: No new dependencies — uses git which is universally available. Handles auth naturally (SSH keys, credential helpers, tokens).
  • Shallow clone: --depth 1 minimizes bandwidth and disk for skill repos.
  • Hash-based caching: sha256(url + ref)[:16] as cache directory name. Repeated loads are instant.
  • Graceful degradation: Failed URL clones log a warning and skip (consistent with how nonexistent filesystem paths are handled).
  • No new dependencies: Only stdlib modules (subprocess, hashlib, shutil).

Test plan

  • All 171 skill tests pass (32 new tests added)
  • Ruff lint and format pass
  • Pre-existing mypy error is unchanged (unrelated @hook decorator typing)
  • Manual test with a real public skill repo (e.g., aws-data-agent-skill)
  • Manual test with a nested real public skill repo (e.g., obra/superpowers/brainstorming)

Davide Gallitelli and others added 2 commits April 9, 2026 00:06
Add support for remote Git repository URLs as skill sources in
AgentSkills, enabling teams to share and reference skills directly
from GitHub without manual cloning.

Closes strands-agents#2090

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse GitHub web URLs like /tree/<ref>/path to extract the clone URL,
branch, and subdirectory path. This enables loading skills from
subdirectories within mono-repos.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dgallitelli dgallitelli force-pushed the feat/skills-github-url-loading branch from 01a4fc3 to 83966f6 Compare April 9, 2026 00:06
@github-actions github-actions bot added size/l and removed size/l labels Apr 9, 2026
@mkmeral
Copy link
Copy Markdown
Contributor

mkmeral commented Apr 9, 2026

/strands review

@mkmeral mkmeral self-assigned this Apr 9, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 94.44444% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/strands/vended_plugins/skills/agent_skills.py 77.77% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!


logger = logging.getLogger(__name__)

_DEFAULT_CACHE_DIR = Path.home() / ".cache" / "strands" / "skills"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: _DEFAULT_CACHE_DIR = Path.home() / ".cache" / "strands" / "skills" is evaluated at module import time. In containerized environments (Lambda, Docker) where HOME may not be set, Path.home() raises RuntimeError.

Suggestion: Evaluate this lazily, e.g., inside clone_skill_repo when cache_dir is None. You could use a sentinel or simply inline Path.home() / ".cache" / "strands" / "skills" inside the function:

cache_dir = cache_dir or Path.home() / ".cache" / "strands" / "skills"

This also respects XDG_CACHE_HOME if you want to be a good citizen on Linux:

import os
cache_dir = cache_dir or Path(os.environ.get("XDG_CACHE_HOME", Path.home() / ".cache")) / "strands" / "skills"

_DEFAULT_CACHE_DIR = Path.home() / ".cache" / "strands" / "skills"

# Patterns that indicate a string is a URL rather than a local path
_URL_PREFIXES = ("https://", "http://", "git@", "ssh://")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: http:// is included in URL prefixes, allowing skill repos to be cloned over plaintext HTTP. This exposes users to man-in-the-middle attacks where a malicious actor could inject arbitrary code into the cloned skill.

Suggestion: Remove "http://" from _URL_PREFIXES, or at minimum log a warning when an http:// URL is detected in clone_skill_repo. Git itself warns about this, but the SDK should guide users toward secure defaults (tenet: "the obvious path is the happy path").

key = cache_key(url, ref)
target = cache_dir / key

if not target.exists():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Race condition — if two threads/processes call clone_skill_repo concurrently with the same URL, both pass the not target.exists() check and attempt to clone into the same directory. One will fail or produce a corrupt state.

Suggestion: Use an atomic pattern: clone to a temporary directory first, then os.rename() (atomic on the same filesystem) to the target path. If the target already exists by the time rename happens, just delete the temp clone. Something like:

import tempfile

with tempfile.TemporaryDirectory(dir=cache_dir) as tmp:
    tmp_target = Path(tmp) / "repo"
    # ... clone into tmp_target ...
    try:
        tmp_target.rename(target)
    except OSError:
        # Another process won the race — that's fine, use their clone
        pass

return hashlib.sha256(key_input.encode()).hexdigest()[:16]


def clone_skill_repo(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: There is no mechanism to refresh a cached clone. When no ref is specified, the default branch is cloned. If the remote repo is updated, the user gets stale content indefinitely with no obvious way to refresh (other than manually deleting ~/.cache/strands/skills/).

Suggestion: Consider adding a force_refresh: bool = False parameter (or documenting the cache invalidation story clearly). At minimum, document in the docstring that users should delete the cache directory or pin a ref for reproducibility. This is especially important since the cache key is a hash — users can't easily find and delete the right directory.

List of Skill instances loaded from the repository.

Raises:
RuntimeError: If the repository cannot be cloned or ``git`` is not
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The Raises section documents RuntimeError but the method also raises ValueError (via is_url check on line 377). Additionally, calling Skill.from_url() with a non-URL raises ValueError, but using the same string through AgentSkills(skills=["./path"]) treats it as a local path. This inconsistency could confuse users.

Suggestion: Add ValueError to the Raises section in the docstring.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Issue: This PR introduces new public API surface (Skill.from_url() classmethod, cache_dir parameter on AgentSkills.__init__). Per the API Bar Raising guidelines, this should have the needs-api-review label since it adds a new public classmethod that customers will use directly.

Key questions an API reviewer should evaluate:

  1. Should from_url return list[Skill] (inconsistent with from_file which returns a single Skill)? The list return type is due to the possibility of multi-skill repos, but this creates an asymmetric API. Consider whether a separate from_url_directory would be cleaner, or if from_url should raise when multiple skills are found unless the user explicitly opts in.
  2. Is cache_dir the right parameter name and placement? It's only relevant when URL sources are used but appears as a top-level AgentSkills.__init__ parameter alongside unrelated parameters like state_key and max_resource_files.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Issue: The _url_loader module is not mentioned in the directory structure in AGENTS.md. Per the AGENTS.md instructions: "After making changes that affect the directory structure (adding new directories, moving files, or adding significant new files), you MUST update this directory structure section."

Suggestion: Add _url_loader.py under the vended_plugins/skills/ section in the AGENTS.md directory tree.

from ._url_loader import clone_skill_repo, is_url, parse_url_ref

if not is_url(url):
raise ValueError(f"url=<{url}> | not a valid remote URL")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The from_url ValueError uses an f-string in the exception message: f"url=<{url}> | not a valid remote URL". This is consistent with the structured logging style used elsewhere, but the AGENTS.md specifically says "Don't use f-strings in logging calls." Since this is an exception message (not a logging call), this is acceptable — but the logging calls throughout the module correctly use %s format, which is good.

However, the RuntimeError raised in clone_skill_repo at line 189 also uses an f-string: f"url=<{url}>, ref=<{ref}> | failed to clone...". This is fine since it's an exception message, not a log statement.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Review Summary

Assessment: Request Changes

This is a well-structured feature that fills a real user need. The design decisions (subprocess git, shallow clone, hash-based caching) are sound and well-documented in the PR description. The test coverage is thorough with 32 new tests.

Review Categories
  • Security: http:// URL support exposes users to MITM risks; Path.home() at import time can fail in containers. The subprocess usage is properly handled with shell=False.
  • Reliability: Race condition in cache directory creation when concurrent processes clone the same URL. No cache invalidation mechanism for unpinned refs means stale content with no obvious path to refresh.
  • API Design: This PR adds public API surface (Skill.from_url(), cache_dir on AgentSkills) and should have the needs-api-review label. The from_url returning list[Skill] is asymmetric with from_file returning Skill — worth discussing with an API reviewer.
  • Documentation: No documentation PR linked — this adds new public API that users need to know about. AGENTS.md directory tree and set_available_skills docstring need updating.

The core implementation is solid — addressing the security and reliability items above would make this ready to merge.

- Security: remove http:// support (MITM risk), only allow https/ssh/git@
- Reliability: atomic clone via tempdir + rename to prevent race conditions
- Reliability: evaluate cache dir lazily (not at import time) for containers
- Usability: respect XDG_CACHE_HOME for cache directory
- Usability: add force_refresh parameter to re-clone stale unpinned refs
- Docs: add ValueError to Skill.from_url() Raises section
- Docs: add _url_loader.py to AGENTS.md directory tree
- Tests: add coverage for XDG_CACHE_HOME, force_refresh, race condition,
  http:// rejection (184 tests total)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot removed the size/l label Apr 10, 2026
Per maintainer feedback, remove all git/SSH/cache logic and replace
with simple HTTPS fetch of SKILL.md content via urllib (stdlib).

- Drop git clone, subprocess, caching, tempdir, race condition handling
- Only support https:// URLs (no git@, ssh://, http://)
- Auto-resolve GitHub web URLs to raw.githubusercontent.com
- Skill.from_url() now returns a single Skill (not list)
- Remove cache_dir, force_refresh params from public API
- Net -532 lines, zero new dependencies

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/m and removed size/xl labels Apr 11, 2026
@dgallitelli
Copy link
Copy Markdown
Contributor Author

@mkmeral Great call — I've completely reworked this to drop all git/SSH/cache logic. The implementation is now HTTPS-only with zero new dependencies.

What changed

Removed:

  • git clone, subprocess, cache directory management, tempdir/race-condition handling
  • git@, ssh://, http:// support
  • cache_dir and force_refresh parameters

New approach:

  • Fetch SKILL.md content directly via urllib.request (stdlib)
  • Auto-resolve GitHub web URLs → raw.githubusercontent.com
  • Skill.from_url() returns a single Skill (not a list) — cleaner API

Net result: -532 lines, zero dependencies, sandbox-friendly.

Supported URL patterns

# Direct raw URL
Skill.from_url("https://raw.githubusercontent.com/org/repo/main/SKILL.md")

# GitHub repo root (resolves to raw SKILL.md at root)
Skill.from_url("https://github.com/org/skill-repo")

# GitHub repo with @ref
Skill.from_url("https://github.com/org/skill-repo@v1.0.0")

# GitHub /tree/ URL (resolves to raw SKILL.md in that directory)
Skill.from_url("https://github.com/org/repo/tree/main/skills/my-skill")

# Any HTTPS URL hosting a SKILL.md
Skill.from_url("https://example.com/path/to/SKILL.md")

All tested manually with real URLs (task-reviewer from your repo, brainstorming from obra/superpowers, data-agent from dgallitelli/aws-data-agent-skill) and via a live Bedrock agent.

Each source can be a Skill instance, a path to a skill directory,
or a path to a parent directory containing multiple skills.
a path to a parent directory containing multiple skills, or a remote
Git URL.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: This docstring says "remote Git URL" but the implementation no longer uses git — it only supports https:// URLs fetched via urllib. This is stale from the v1 implementation.

Suggestion: Update to match the current implementation:

a path to a parent directory containing multiple skills, or an
HTTPS URL pointing to a SKILL.md file.

AGENTS.md Outdated
│ │ │ ├── core/ # Base classes, actions, context
│ │ │ └── handlers/ # Handler implementations (e.g., LLM)
│ │ └── skills/ # AgentSkills.io integration (Skill, AgentSkills)
│ │ └── _url_loader.py # Remote URL loading, git clone, caching
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The AGENTS.md entry for _url_loader.py says "Remote URL loading, git clone, caching" which describes the v1 implementation. The module now does HTTPS fetching with no git or caching.

Suggestion: Update to:

│   │       └── _url_loader.py            # HTTPS skill fetching, GitHub URL resolution

return url

# Not a github.com URL — return as-is (user provides a direct link)
if not url.startswith("https://github.com"):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: resolve_to_raw_url only handles GitHub-specific URL patterns (github.comraw.githubusercontent.com). For non-GitHub URLs that aren't direct links to a SKILL.md file, the URL is returned as-is. If a user passes https://gitlab.com/org/repo, the function will return it unchanged and fetch_skill_content will try to fetch an HTML page (GitLab repo page), not the raw SKILL.md content.

This isn't necessarily a bug — the function documents that non-GitHub URLs pass through — but the error experience could be confusing: the user gets a RuntimeError from a failed parse of HTML content, not a clear "unsupported host" message.

Suggestion: Consider logging a debug message for non-GitHub URLs to make it clear the SDK expects a direct link to SKILL.md content, or add a brief note in the Skill.from_url docstring that only GitHub URLs get automatic resolution and other hosts require a direct link to the raw SKILL.md file.

logger.info("url=<%s> | fetching skill content from %s", url, resolved)

try:
req = urllib.request.Request(resolved, headers={"User-Agent": "strands-agents-sdk"}) # noqa: S310
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The S310 noqa suppression is appropriate here since the URL is validated to be https:// by the caller (Skill.from_url calls is_url first). However, fetch_skill_content itself doesn't validate that url starts with https:// — it could be called directly with any URL. Since this is a private module, the risk is low, but a defensive check would make the function safer independently.

Suggestion: Consider adding a quick guard:

if not url.startswith("https://"):
    raise ValueError(f"url=<{url}> | only https:// URLs are supported")

This would make the S310 suppression more clearly justified.

if owner_repo.count("/") == 1:
return f"https://raw.githubusercontent.com/{owner_repo}/{ref}/SKILL.md"

# Unrecognised GitHub URL pattern — return as-is
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: There's no test for the fallback case on line 113 where a GitHub URL has more than 2 path segments (e.g., https://github.com/org/repo/something-unexpected) and falls through to return the URL as-is. This is the "Unrecognised GitHub URL pattern" branch. Codecov also reports some missing coverage in this file.

Suggestion: Add a test case like:

def test_unrecognized_github_path(self):
    url = "https://github.com/org/repo/wiki/Some-Page"
    assert resolve_to_raw_url(url) == url

@github-actions
Copy link
Copy Markdown

Issue: This PR adds new public API surface (Skill.from_url() classmethod) that users need to know about to use the SDK correctly. The PR description doesn't include a "Documentation PR" section linking to a https://github.com/strands-agents/docs/pull/... PR.

Per the API bar raising guidelines, this is a moderate change (adding a new classmethod to an existing class that customers will use directly). The PR description is well-prepared with use cases and example code — this is great. Consider adding the needs-api-review label so an API reviewer can confirm the design before merge.

A documentation PR should be opened to document Skill.from_url() and the URL support in AgentSkills.

@github-actions
Copy link
Copy Markdown

Review Summary

Assessment: Request Changes

The v2 rewrite is a big improvement — dropping git/SSH/cache in favor of HTTPS-only fetch with urllib (stdlib) makes this much simpler and sandbox-friendly. The core implementation is clean, well-tested, and the API design (Skill.from_url() returning a single Skill) is consistent with from_file. The remaining issues are minor documentation items and one missing test branch.

Review Categories
  • Stale references: The _resolve_skills docstring and AGENTS.md entry still reference "Git URL" / "git clone, caching" from v1 — needs updating to reflect the HTTPS-only approach.
  • Documentation gaps: set_available_skills docstring doesn't mention URL support; no documentation PR linked for the new public Skill.from_url() API.
  • Test coverage: Missing test for the unrecognized GitHub URL fallback path (line 113 of _url_loader.py), which Codecov also flags.
  • Defensive coding: fetch_skill_content doesn't independently validate the URL is https://, relying on callers — a small guard would strengthen the S310 noqa justification.

Great work on the simplification — this is close to ready.

- Fix mypy error: annotate response.read().decode() return type
- Add defensive https:// check in fetch_skill_content()
- Fix stale "Git URL" docstring in _resolve_skills()
- Fix stale "git clone, caching" description in AGENTS.md
- Add note that non-GitHub URLs must point directly to SKILL.md
- Add tests for unrecognized GitHub URL fallthrough and http:// rejection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
return source.startswith("https://")


def resolve_to_raw_url(url: str) -> str:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm don't think we should be doing the raw url translation. I'd say this is an application concern, that they should pass correct URLs to the Skill.from_url

This PR should be targeted at loading skills from URL, should not be opinionated on what that url is. At the end of the day, it's up to the devs to pass in the correct url

Skill.from_url should essentially be: get url as input, make GET request to get content from the url, pass it down to Skill.from_content. We should not have any complex resolution

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — removed all the GitHub URL resolution logic in the latest push. Skill.from_url() is now just: validate https://, GET the URL, pass to from_content(). ~65 lines total in _url_loader.py, no opinions on URL format.

dgallitelli pushed a commit to dgallitelli/docs that referenced this pull request Apr 13, 2026
Document Skill.from_url() and HTTPS URL support in AgentSkills:
- New "Loading skills from URLs" section with examples
- Add from_url() to programmatic creation examples
- Update configuration table to mention URL sources
- Note about resource directory limitations for URL-loaded skills

Companion to strands-agents/sdk-python#2091

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dgallitelli
Copy link
Copy Markdown
Contributor Author

Documentation PR: strands-agents/docs#756

Per review comment, Skill.from_url() should not be opinionated about
URL format. Remove all resolve_to_raw_url logic — the method now simply
GETs the provided URL and passes content to from_content(). Users are
responsible for providing a direct link to raw SKILL.md content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Returns:
Dict mapping skill names to Skill instances.
"""
from ._url_loader import is_url
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we explicitly need url loader import in method? if not, can we move to top?

ValueError: If ``url`` is not an ``https://`` URL.
RuntimeError: If the SKILL.md content cannot be fetched.
"""
from ._url_loader import fetch_skill_content, is_url
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not have imports in methods, unless necessary. let's move this to the top

return source.startswith("https://")


def fetch_skill_content(url: str) -> str:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need another file? can we just move this under Skill.from_url (or like as helper methods there)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AgentSkills: support loading skills from GitHub URLs

2 participants